Fully Bayesian Logistic Regression with Hyper-Lasso Priors for High-dimensional Feature Selection

نویسندگان

  • Longhai Li
  • Weixin Yao
چکیده

High-dimensional feature selection arises in many areas of modern sciences. For example, in genomic research we want to find the genes that can be used to separate tissues of different classes (eg. cancer and normal) from tens of thousands of genes that are active (expressed) in certain tissue cells. To this end, we wish to fit regression and classification models with a large number of features (also called variables, predictors), which is still a tremendous challenge to date. In the past few years, penalized likelihood methods for fitting regression models based on hyper-lasso penalization have been explored considerably in the literature. However, fully Bayesian methods that use Markov chain Monte Carlo (MCMC) for fitting regression and classification models with hyper-lasso priors are still lack of investigation. In this paper, we introduce a new class of methods for fitting Bayesian logistic regression models with hyper-lasso priors using Hamiltonian Monte Carlo in restricted Gibbs sampling framework. We call our methods BLRHL for short. We use simulation studies to test BLRHL by comparing to LASSO, and to investigate the problems of choosing heaviness and scale in BLRHL. The main findings are that the choice of heaviness of prior plays a critical role in BLRHL, and that BLRHL is relatively robust to the choice of prior scale. We further demonstrate and investigate BLRHL in an application to a real microarray data set related to prostate cancer, which confirms the previous findings. An R add-on package called BLRHL will be available from http://math.usask.ca/~longhai/software/BLRHL. Key phrases: high-dimensional, feature selection, heavy-tailed prior, hyper-lasso priors, MCMC, Hamiltonian Monte Carlo, Gibbs sampling, fully Bayesian. ∗Correspondance author, Associate Professor, Department of Mathematics and Statistics, University of Saskatchewan, Saskatoon, SK, S7N5E6, CANADA. e-mail: [email protected]. †Associate Professor, Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. e-mail: [email protected].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Bayesian Lasso

The Lasso estimate for linear regression parameters can be interpreted as a Bayesian posterior mode estimate when the priors on the regression parameters are independent double-exponential (Laplace) distributions. This posterior can also be accessed through a Gibbs sampler using conjugate normal priors for the regression parameters, with independent exponential hyperpriors on their variances. T...

متن کامل

A Bayesian Lasso via reversible-jump MCMC

Variable selection is a topic of great importance in high-dimensional statistical modeling and has a wide range of real-world applications. Many variable selection techniques have been proposed in the context of linear regression, and the Lasso model is probably one of the most popular penalized regression techniques. In this paper, we propose a new, fully hierarchical, Bayesian version of the ...

متن کامل

CS535D Project: Bayesian Logistic Regression through Auxiliary Variables

This project deals with the estimation of Logistic Regression parameters. We first review the binary logistic regression model and the multinomial extension, including standard MAP parameter estimation with a Gaussian prior. We then turn to the case of Bayesian Logistic Regression under this same prior. We review the cannonical approach of performing Bayesian Probit Regression through auxiliary...

متن کامل

The Iterated Lasso for High-Dimensional Logistic Regression

We consider an iterated Lasso approach for variable selection and estimation in sparse, high-dimensional logistic regression models. In this approach, we use the Lasso (Tibshirani 1996) to obtain an initial estimator and reduce the dimension of the model. We then use the Lasso as the initial estimator in the adaptive Lasso (Zou 2006) to obtain the final selection and estimation results. We prov...

متن کامل

Applying Penalized Binary Logistic Regression with Correlation Based Elastic Net for Variables Selection

Reduction of the high dimensional classification using penalized logistic regression is one of the challenges in applying binary logistic regression. The applied penalized method, correlation based elastic penalty (CBEP), was used to overcome the limitation of LASSO and elastic net in variable selection when there are perfect correlation among explanatory variables. The performance of the CBEP ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014